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Abstract 

We address covariance estimation in the sense of minimum mean-squared error (MMSE) for Gaussian 
samples. Specifically, we consider shrinkage methods which are suitable for high dimensional problems 
with a small number of samples (large p small n). First, we improve on the Ledoit-Wolf (LW) method by 
conditioning on a sufficient statistic. By the Rao-Blackwell theorem, this yields a new estimator called 
RBLW, whose mean-squared error dominates that of LW for Gaussian variables. Second, to further reduce 
the estimation error, we propose an iterative approach which approximates the clairvoyant shrinkage 
estimator. Convergence of this iterative method is established and a closed form expression for the limit 
is determined, which is referred to as the oracle approximating shrinkage (OAS) estimator. Both RBLW 
and OAS estimators have simple expressions and are easily implemented. Although the two methods are 
developed from different persepctives, their structure is identical up to specified constants. The RBLW 
estimator provably dominates the LW method. Numerical simulations demonstrate that the OAS approach 
can perform even better than RBLW, especially when n is much less than p. We also demonstrate the 
performance of these techniques in the context of adaptive beamforming. 
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I. Introduction 

Covariance matrix estimation is a fundamental problem in signal processing and related fields. Many 
applications varying from array processing [12] to functional genomics [17] rely on accurately estimated 
covariance matrices. In recent years, estimation of high dimensional p x p covariance matrices under small 
sample size n has attracted considerable interest. Examples include classification on gene expression from 
microarray data, financial forecasting, spectroscopic imaging, brain activation mapping from fMRI and 
many others. Standard estimation methods perform poorly in these large p small n settings. This is the 
main motivation for this work. 

The sample covariance is a common estimate for the unknown covariance matrix. When it is invertible, 
the sample covariance coincides with the classical maximum likelihood estimate. However, while it is 
an unbiased estimator, it does not minimize the mean-squared error (MSE). Indeed, Stein demonstrated 
that superior performance may be obtained by shrinking the sample covariance [2], [3]. Since then, 
many shrinkage estimators have been proposed under different performance measures. For example, Haff 
[4] introduced an estimator inspired by the empirical Bayes approach. Dey and Srinivasan [5] derived 
a minimax estimator under Stein's entropy loss function. Yang and Berger [6] obtained expressions for 
Bayesian estimators under a class of priors for the covariance. These works addressed the case of invertible 
sample covariance when n> p. Recently, Ledoit and Wolf (LW) proposed a shrinkage estimator for the 
case n < p which asymptotically minimizes the MSE [8]. The LW estimator is well conditioned for small 
sample sizes and can thus be applied to high dimensional problems. In contrast to previous approaches, 
they show that performance advantages are distribution-free and not restricted to Gaussian assumptions. 

In this paper, we show that the LW estimator can be significantly improved when the samples are in fact 
Gaussian. Specifically, we develop two new estimation techniques that result from different considerations. 
The first follows from the Rao-Blackwell theorem, while the second is an application of the ideas of [11] 
to covariance estimation. 

We begin by providing a closed form expression for the optimal clairvoyant shrinkage estimator under 
an MSE loss criteria. This estimator is an explicit function of the unknown covariance matrix that can 
be used as an oracle performance bound. Our first estimator is obtained by applying the well-known 
Rao-Blackwell theorem [31] to the LW method, and is therefore denoted by RBLW. Using several 
nontrivial Haar integral computations, we obtain a simple closed form solution which provably dominates 
the LW method in terms of MSE. We then introduce an iterative shrinkage estimator which tries to 
approximate the oracle. This approach follows the methodology developed in [11] for the case of linear 
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regression. Beginning with an initial naive choice, each iteration is defined as the oracle solution when 
the unknown covariance is replaced by its estimate obtained in the previous iteration. Remarkably, a 
closed form expression can be determined for the limit of these iterations. We refer to the limit as the 
oracle approximating shrinkage (OAS) estimator. 

The OAS and RBLW solutions have similar structure that is related to a sphericity test as discussed in 
[18]-[20]. Both OAS and RBLW estimators are intuitive, easy to compute and perform well with finite 
sample size. The RBLW technique provably dominates LW. Numerical results demonstrate that for small 
sample sizes, the OAS estimator is superior to both the RBLW and the LW methods. 

To illustrate the proposed covariance estimators we apply them to problems of time series analysis 
and array signal processing. Specifically, in the context of time series analysis we establish performance 
advantages of OAS and RBLW to LW for covariance estimation in autoregressive models and in fractional 
Brownian motion models, respectively. In the context of beamforming, we show that RBLW and OAS 
can be used to significantly improve the Capon beamformer. In [12] a multitude of covariance matrix 
estimators were implemented in Capon beamformers, and the authors reported that the LW approach 
substantially improves performance as compared to other methods. We show here that even better 
performance can be achieved by using the techniques introduced in this paper. 

The paper is organized as follows. Section 2 formulates the problem. Section 3 introduces the oracle 
estimator together with the RBLW and OAS methods. Section 4 represents numerical simulation results 
and applications in adaptive beamforming. Section 5 summarizes our principal conclusions. The proofs 
of theorems and lemmas are provided in the Appendix. 

Notation: In the following, we depict vectors in lowercase boldface letters and matrices in uppercase 
boldface letters. (-) T and (-) H denote the transpose and the conjugate transpose, respectively. Tr (•), 
\\-\\ F , and det (•) are the trace, the Frobenius norm, and the determinant of a matrix, respectively. Finally, 
A -< B means that the matrix B — A is positive definite, and A y B means that the matrix A — B is 
positive definite. 

II. Problem formulation 

Let {xj}" =1 be a sample of independent identical distributed (i.i.d.) p-dimensional Gaussian vectors 
with zero mean and covariance S. We do not assume n > p. Our goal is to find an estimator T, ({xj}" =1 ) 
which minimizes the MSE: 




(1) 
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It is difficult to compute the MSE of S({xj}" =1 ) without additional constraints and therefore we 
restrict ourselves to a specific class of estimators that employ shrinkage [1], [7]. The unstructured classical 
estimator of £ is the sample covariance S defined as 



1 n 



n . 

1=1 



Xixf. (2) 



This estimator is unbiased E{S} = E, and is also the maximum likelihood solution if n > p. However, 
it does not necessarily achieve low MSE due to its high variance and is usually ill-posed for large p 
problems. On the other hand, we may consider a naive but most well-conditioned estimate for 



TV S 

F = (3) 

p 

This "structured" estimate will result in reduced variance with the expense of increasing the bias. A 
reasonable tradeoff between low bias and low variance is achieved by shrinkage of S towards F, resulting 
in the following class of estimators: 

£ = (1 - p)S + pF. (4) 

The estimator ~E is characterized by the shrinkage coefficient p, which is a parameter between and 1 
and can be a function of the observations {xj}™ =1 . The matrix F is referred to as the shrinkage target. 1 
Throughout the paper, we restrict our attention to shrinkage estimates of the form (4). Our goal is 
to find a shrinkage coefficient p that minimizes the MSE (1). As we show in the next section, the 
optimal p minimizing the MSE depends in general on the unknown and therefore in general cannot 
be implemented. Instead, we propose two different approaches to approximate the optimal shrinkage 
coefficient. 

III. Shrinkage algorithms 

A. The Oracle estimator 

We begin by deriving a clairvoyant oracle estimator that uses an optimal nonrandom coefficient to 
minimize the mean-squared error. In the following subsections we will show how to approximate the 
oracle using implementable data-driven methods. 

'The convex combination in (4) can be generalized to the linear combination of S and F. The reader is referred to [13] for 
further discussion. 
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The oracle estimate So is the solution to 



min 

p 



E 



S -S 



(5) 



s.t. S = (1 - p) S + pF 
The optimal parameter po is provided in the following theorem. 



Theorem 1. Let S be the sample covariance of a set of p-dimensional vectors {xj}" =1 . If {x-i}™ =1 are 
i.i.d. Gaussian vectors with covariance S, then the solution to (5) is 



PO 









-m 


E 


{ 


S-F 




\ 



;i-2/p)TV(S 2 ) +Tr 2 (S) 



(6) 



(7) 



(n + 1 - 2/p)Tr(S 2 ) + (1 - n/p) TV 2 (S 
Proof: Equation (6) was established in [7] for any distribution of {xj}™ =1 . Under the additional 
Gaussian assumption, (7) can be obtained from straightforward evaluation of the expectations: 

Tr(S) 



E 



{tv ((s -s) (f - S 

E\Tx 2 (§)} 



-E 



P 



{ Tr (s) } 



(8) 



P 



-£{Tr(ss)} + £{Tr(S 2 )}, 



and 



E 



S-F 



E {Tr (§ 2 ) } - 2E |TV (sf) } + E {Tr (f 2 ) } 



(9) 



^|Tr (S 2 )} - 



E 



p 



Equation (7) is a result of using the following identities [27]: 

^{Tr(s)}=Tr(S), 
£<|Tr (§ 2 )} 

and 



^±1tv (S 2 ) + iTr 2 (£) , 

n ' n 



E {Tr 2 (§) } = Tr 2 (£) + ^Tr (S 2 ) . 



(10) 
(11) 

(12) 



Note that (6) specifies the optimal shrinkage coefficient for any sample distribution while (7) only 
holds for the Gaussian distribution. 
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B. The Rao-Blackwell Ledoit-Wolf (RBLW) estimator 

The oracle estimator defined by (5) is optimal but cannot be implemented, since the solution specified 
by both (6) and (7) depends on the unknown £. Without any knowledge of the sample distribution, 
Ledoit and Wolf [7], [8] proposed to approximate the oracle using the following consistent estimate of 
(6): 



PLW 



E 

i=i 



XjX- 



T 



(13) 



n 2 p (^S 2 j - Tr 2 (SJ jp 

They then proved that when both n,p — > oo and p/n — > c, < c < oo, (13) converges to (6) in 
probability regardless of the sample distribution. The LW estimator XIlw is then defined by plugging 
Plw into (4). In [8] Ledoit and Wolf also showed that the optimal po (6) is always between and 1. 
To further improve the performance, they suggested using a modified shrinkage parameter 



Plw = min (Plw A) 



(14) 



instead of Plw- 

The Rao-Blackwell LW (RBLW) estimator described below provably improves on the LW method 
under the Gaussian model. The motivation for the RBLW originates from the fact that under the Gaussian 
assumption on {xj}" =1 , a sufficient statistic for estimating £ is the sample covariance S. Intuitively, the 
LW estimator is a function of not only S but other statistics and therefore, by the Rao-Blackwell theorem, 
can be improved. Specifically, the Rao-Blackwell theorem [31] states that if g(X) is an estimator of a 
parameter 9, then the conditional expectation of g(X) given T(X), where T is a sufficient statistic, is 
never worse than the original estimator g(X) under any convex loss criterion. Applying the Rao-Blackwell 
theorem to the LW estimator yields the following result. 

Theorem 2. Let {xj}™ =1 be independent p-dimensional Gaussian vectors with covariance X, and let S 
be the sample covariance of {xj}" =1 . The conditioned expectation of the LW covariance estimator is 



■'RBLW 



= E 



■•LW 



where 



= (1 - PRBLw)S + PRBLWF 

(n - 2)/n • Tr (s 2 ) + TV 2 (§ 



PRBLW 



(n + 2) Tr (S 2 ) - Tr 2 (sj /p 



(15) 
(16) 

(17) 
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This estimator satisfies 

E 

for every S. 



^RBLW — S 











s 




Sw — S 











(18) 



The proof of Theorem 2 is given in the Appendix. 
Similarly to the LW estimator, we propose the modification 

Prblw = min (prblw, 1) (19) 

instead of prblw- 

C. The Oracle-Approximating Shrinkage ( OAS) estimator 

The basic idea of the LW estimator is to asymptotically approximate the oracle, which is designed for 
large sample size. For a large number of samples the LW asymptotically achieves the minimum MSE 
with respect to shrinkage estimators. Clearly, the RBLW also inherits this property. However, for very 
small n, which is often the case of interest, there is no guarantee that such optimality still holds. To 
illustrate this point, consider the extreme example when only one sample is available. For n = 1 we have 
both p* LW = 1 and Prblw = 1> which indicates that ^lw = ^rblw = S. This however contradicts 
our expectations since if a single sample is available, it is more reasonable to expect more confidence to 
be put on the more parsimonious F rather than S. 

In this section, we consider an alternative approach to approximate the oracle estimator based on [11]. 
In (7), we obtained a closed-form formula of the oracle estimator under Gaussian assumptions. The idea 
behind the OAS is to approximate this oracle via an iterative procedure. We initialize the iterations with 
an initial guess of S and iteratively refine it. The initial guess So might be the sample covariance, 
the RBLW estimate or any other symmetric non-negative definite estimator. We replace S in the oracle 
solution by So yielding Si, which in turn generates S2 through our proposed iteration. The iteration 
process is continued until convergence. The limit, denoted as Soas, is the OAS solution. Specifically, 
the proposed iteration is, 

(l-2/p)Tr(s,s)+Tr 2 (%) 

pj+i = /A A J v y 9/ ^ v (20) 

(n + 1 - 2/p)TV (S^Sj + (1 - n/p) Tr 2 (S^J 
S j+1 = (l-p j+1 )S + p j+1 F. (21) 
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Comparing with (7), notice that in (20) Tr(E) and Tr(£ 2 ) are replaced by Tr(Sj) and Tir(EjS), 
respectively. Here Tr(SjS) is used instead of Tr(£ 2 ) since the latter would always force pj to converge 
to 1 while the former leads to a more meaningful limiting value. 

Theorem 3. For any initial guess po that is between and 1, the iterations specified by (20), (21) 
converge <xs j — ► oo to the following estimate: 



where 



^OAS = (1 - P*oas) S + P*OAS F > 
(l-2/p)Tr (s 2 ) +Tr 2 (s 



POAS = mm 



(n + l-2/p) Tr S 2 - Tr 2 S ) /p 



1 • 



In addition, < Pqas — 1- 



Proof: Plugging in Sj from (21) into (20) and simplifying yields 



Pj+i 



1 - (1 - 2/p)<t> Pj 



where 



1 + n<j) - (n + 1 - 2/p)#j 

Tr (§ 2 ) - Tr 2 (§) /p 
Tt(%A +Tr 2 (§ 



Since Tr(S 2 ) > Tr 2 (S)/p, < <f> < 1. Using a simple change of variables 

f 1 



1 - (n + 1 - 2/p)<f>pj 
(24) is equivalent to the following geometric series 

- n<p f I 



-6,+ 



It is easy to see that 



oo, 



1 - (1 - 2/p)<j> 1 - (1 - 2/p)4> 

n<p 



lim bj = < 

j— >oo 



1 



1 - (1 - 2/p)<f> 
n<f) 



> 1 



< 1 



[ 1 - (n + 1 - 2/p)0 1 - (1 - 2/p)0 
Therefore /3j also converges as j ^ oo and Pqas * s gi yen by 

'' (n + 1 - 2/p)0 > 1 



POAS = ^h={ (^+ 1 - 2 /^ 

1 (n + 1 - 2/p)0 < 1 



(22) 



(23) 



(24) 



(25) 



(26) 



(27) 



(28) 



(29) 
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We can write (29) equivalently as 

p*o A o = min ( -, 1 | . (30) 

° S \{n+l-2/p)^ J 

Equation (23) is obtained by substituting (25) into (29). ■ 

Note that (29) Pqas * s naturally bounded within [0, 1]. This is different from p* LW and P*rblw> wriere 

the [0, 1] constraint is imposed in an ad hoc fashion. 

D. Shrinkage and sphericity statistics 

We now turn to theoretical comparisons between RBLW and OAS. The only difference is in their 
shrinkage coefficients. Although derived from distinct approaches, it is easy to see that p* OAS shares the 
same structure as p* RBLW - In fact, they can both be expressed as the parameterized function 

p* E = min + 1^ (31) 

with U defined as 




U = — 1 \ (32) 



For p* E = Pqa S , a and (5 of (31) are given by 

a = aoAS = 



while for p* E = p* RBLW : 



n + l- 2/p 

^ , 03) 

fi = 0OAS= (n + l-2/p)(p-l) 
n-2 

a = ctRBLW 



n ( n + 2 ) ^ 

P = PRBLW = 7- 7 oV 7T 

n(n + 2)(p — 1) 

Thus the only difference between Pq^s an ^ Prblw ^ s tne choice of a and (3. The statistic U arises in 
tests of sphericity of E [19], [20], i.e., testing whether or not E is a scaled identity matrix. In particular, 
U is the locally most powerful invariant test statistic for sphericity under orthogonal transformations [18]. 
The smaller the value of U, the more likely that is proportional to an identity matrix I. Similarly, in 
our shrinkage algorithms, the smaller the value of U, the more shrinkage occurs in ~Eqas an d Erblw- 
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IV. Numerical Simulations 

In this section we implement and test the proposed covariance estimators. We first compare the 
estimated MSE of the RBLW and OAS techniques with the LW method. We then consider their application 
to the problem of adaptive beamforming, and show that they lead to improved performance of Capon 
beamformers. 

A. MSE Comparison 

To test the MSE of the covariance estimators we designed two sets of experiments with different 
shapes of S. Such covariance matrices have been used to study covariance estimators in [10]. We use 
(14), (19) and (23) to calculate the shrinkage coefficients for the LW, the RBLW and the OAS estimators. 
For comparison, the oracle estimator (5) uses the true £ and is included as a benchmark lower bound on 
MSE for comparison. For all simulations, we set p = 100 and let n range from 6 to 30. Each simulation 
is repeated 5000 times and the MSE and shrinkage coefficients are plotted as a function of n. The 95% 
confidence intervals of the MSE and shrinkage coefficients were found to be smaller than the marker 
size and are omitted in the figures. 

In the first experiment, an autoregressive covariance structured £ is used. We let £ be the covariance 
matrix of a Gaussian AR(1) process [32], 

S^ = rl^'l, (35) 

where denotes the entry of £ in row i and column j. We take r = 0.1, 0.5 and 0.9 for the different 
simulations reported below. Figs. l(a)-3(a) show the MSE of the estimators for different values of r. 
Figs. l(b)-3(b) show the corresponding shrinkage coefficients. 

In Fig. 4 we plot the MSE of the first three iterations obtained by the iterative procedure in (21) and 
(20). For comparison we also plot the results of the OAS and the oracle estimator. We set r = 0.5 in 
this example and start the iterations with the initial guess £o = S. From Fig. 4 it can be seen that as the 
iterations proceed, the MSE gradually decreases towards that of the OAS estimator, which is very close 
to that of the oracle. 

In the second experiment, we set £ as the covariance matrix associated with the increment process of 
fractional Brownian motion (FBM) exhibiting long-range dependence. Such processes are often used to 
model internet traffic [29] and other complex phenomena. The form of the covariance matrix is given by 

=« = \ [(I* - J\ + I) 2 " - 2|< - j\ 2H + (\i - j\ - lf H ] , (36) 
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n 

(a) MSE 




0.8 1 1 1 1 ' 

5 10 15 20 25 30 

n 

(b) Shrinkage coefficients 
Fig. 1. AR(1) process: Comparison of covariance estimators when p — 100, r = 0.1. 

where H G [0.5, 1] is the so-called Hurst parameter. The typical value of H is below 0.9 in practical 
applications. We choose H equal to 0.6, 0.7 and 0.8. The MSE and shrinkage coefficients are plotted in 
Figs. 5(a)-7(a) and Figs. 5(b)-7(b), respectively. 

From the simulation results in the above two experiments, it is evident that the OAS estimator performs 
very closely to the ideal oracle estimator. When n is small, the OAS significantly outperforms the LW 
and the RBLW. The RBLW improves slightly upon the LW, but this is not obvious at the scale of the 
plots shown in the figures. As expected, all the estimators converge to a common value when n increases. 

As indicated in (5) and shown from simulation results, the oracle shrinkage coefficient po decreases 
in the sample number n. This makes sense since (1 — po) can be regarded as a measure of "confidence" 
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(a) MSE 




5 10 15 20 25 30 



(b) Shrinkage coefficients 
Fig. 2. AR(1) process: Comparison of covariance estimators when p — 100, r = 0.5. 



assigned to S. Intuitively, as more observations are available, one acquires higher confidence in the sample 
covariance S and therefore po decreases. This characteristic is exhibited by p* Q AS ^ ut not by Prblw 
and p* LW - This may partially explain why OAS outperforms RBLW and LW for small samples. All the 
estimators perform better when the sphericity of S increases, which corresponds to small values of r 
and H. 

Our experience through numerous simulations with arbitrary parameters suggests that in practice the 
OAS is preferable to the RBLW. However, as the RBLW is provably better than the LW there exists 
counter examples. For the incremental FBM covariance I] in (36), we set H = 0.9, n = 20, p = 100. 
The simulation is repeated for 5000 times and the result is shown in Table 1, where MSE(5] rblw) < 
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n 

(b) Shrinkage coefficients 
Fig. 3. AR(1) process: Comparison of covariance estimators when p — 100, r = 0.9. 

MSE(So J 4s) < MSE(Siv^)- The differences are very small but establish that the OAS estimator does 
not always dominate the RBLW. However, we suspect that this will only occur when 5] has a very small 
sphericity, a case of less interest in practice as small sphericity of 5] would suggest a different shrinkage 
target than F. 

B. Application to the Capon beamformer 

Next we applied the proposed shrinkage estimators to the signal processing problem of adaptive 
beamforming. Assume that a narrow-band signal of interest s{t) impinges on an unperturbed uniform 
linear array (ULA) [30] comprised of p sensors. The complex valued vector of n snapshots of the array 
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Fig. 4. AR(1) process: Comparison of MSE in different iterations, when p — 100, r — 0.5 

TABLE I 

Incremental FRM process: comparison of MSE and shrinkage coefficients when H = 0.9, n = 20, p = 100. 





MSE 


Shrinkage coefficient 


Oracle 


428.9972 


0.2675 


OAS 


475.2691 


0.3043 


RBLW 


472.8206 


0.2856 


LW 


475.5840 


0.2867 



output is 

x(i) = a(0 a )s(i) +n(t), for t = l,...,n, (37) 

where S is parameter vector determining the location of the signal source and a(9) is the array response 
for a generic source location 9. Specifically, 

a (0) = [l, e -*", e~ j2u; , e -j(p-^f^ (38) 

where uj is the spatial frequency. The noise/interference vector n(t) is assumed to be zero mean i.i.d. 
Gaussian distributed. We model the unknown s(t) as a zero mean i.i.d. Gaussian process. 

In order to recover the unknown s(t) the Capon beamformer [30] linearly combines the array output 
x(t) using a vector of weights w, calculated by 

E -1 a(0 a ) 

w = — (39) 
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where I] is the covariance of x(t). The covariance I] is unknown while the array response a(9) and 
the source direction-of-arrival (DOA) 9 S are known. After obtaining the weight vector w, the signal of 
interest s(t) is estimated by w^x(i). 

To implement (39) the matrix £ needs to be estimated. In [12] it was shown that using the LW 
estimator could substantially improve Capon beamformer performance over conventional methods. As 
we will see below, the OAS and the RBLW shrinkage estimators can yield even better results. 

Note that the signal and the noise processes are complex valued and I] is thus a complex (Hermitian 
symmetric) covariance matrix. To apply the OAS and RBLW estimators we use the same approach as 
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Fig. 6. Incremental FBM process: Comparison of covariance estimators when p — 100, H — 0.7. 



used in [12] to extend the real LW covariance estimator to the complex case. Given a p x 1 complex 
random vector x, we represent it as a 2p x 1 vector of its real and imaginary parts 

/ 



X. 



Re(x) 
^Im (x)^ 

Then the estimate of the complex covariance can be represented as 

/■ 



(40) 



(41) 



where S rr , S r j, Sj r and are p x p sub-matrices. The real representation (41) can be mapped to the 
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(b) Shrinkage coefficients 
Fig. 7. Incremental FBM process: Comparison of covariance estimators when p — 100, H — 0.8. 



full complex covariance matrix £ as 

£ = (s rr + E«) + j ($ ir - S ri ) . (42) 

Using this representation we can easily extend the real valued LW, RBLW and OAS estimators to complex 
scenarios. 

We conduct the beamforming simulation as follows. A ULA of p = 10 sensor elements with half 
wavelength spacing is assumed and three signals were simulated as impinging on the array. The signal 
of interest has a DOA 6 S = 20° and a power a 2 s = 10 dB above the complex Gaussian sensor noise. The 
other two signals are mutually independent interferences. One is at DOA angle of 9n = —30° and the 
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n 

Fig. 8. Comparison between different covariance shrinkage estimators in the Capon beamformer. SINR is plotted versus number 
of snapshots n, OAS achieves as much as 1 dB improvement over the LW. 



other one is close to the source of interest with its angular location corresponding to a spatial frequency 
of 

LO i2 = irsm(9 s ) + 27T- 

p 

where 7 is set to 0.9. Each signal has power 15 dB above the sensor noise. 

We implemented the complex versions of the LW, the RBLW and the OAS covariance estimators, 
described above, and used them in place of £ in the Capon beamformer expression (39). The beamforming 
performance gain is measured by the SINR defined as [12] 

where K is the number of Monte-Carlo simulations and is the weight vector obtained by (39) in the 
fcth simulation. Here K = 5000 and n varies from 10 to 60 in step of 5 snap shots. The gain is shown 
in Fig. 8. In [12] it was reported that the LW estimator achieves the best SINR performances among 
several contemporary Capon-type beamformers. It can be seen in Fig. 8 that the RBLW and the OAS do 
even better, improving upon the LW estimator. Note also that the greatest improvement for OAS in the 
small n regime is observed. 

V. Conclusion 

In this paper, we introduced two new shrinkage algorithms to estimate covariance matrices. The RBLW 
estimator was shown to improve upon the state-of-the-art LW method by virtue of the Rao-Blackwell 
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theorem. The OAS estimator was developed by iterating on the optimal oracle estimate, where the limiting 
form was determined analytically. The RBLW provably dominates the LW, and the OAS empirically 
outperforms both the RBLW and the LW in most experiments we have conducted. The proposed OAS 
and RBLW estimators have simple explicit expressions and are easy to implement. Furthermore, they 
share similar structure differing only in the form of the shrinkage coefficients. We applied these estimators 
to the Capon beamformer and obtained significant gains in performance as compared to the LW Capon 
beamformer implementation. 

Through out the paper we set the shrinkage target as the scaled identity matrix. The theory developed 
here can be extended to other non-identity shrinkage targets. An interesting question for future research 
is how to choose appropriate targets in specific applications. 

VI. Appendix 

In this appendix we prove Theorem 2. Theorem 2 is non-trivial and requires careful treatment using 
results from the theory of Haar measure and singular Wishart distributions. The proof will require several 
intermediate results stated as lemmas. We begin with a definition. 

Definition 1. Let {xj}™ =1 be a sample of p-dimensional i.i.d. Gaussian vectors with mean zero and 
covariance ~E. Define a p x n matrix X as 



where H is a px r matrix such that H H = 1, A is a r xr diagonal matrix in probability 1, comprised 
of the singular values of X, and Q is a r x n matrix such that QQ T = I. 

Next we state and prove three lemmas. 

Lemma 1. Let (H, A, Q) be matrices defined in Definition 1. Then Q is independent ofH and A. 

Proof: For the case n < p, Hisapxn matrix, A is a n x n square diagonal matrix and Q is a 
n x n orthogonal matrix. The pdf of X is 



X= (xi,x 2 ,...,x n ). 



(44) 



Denote r = min(p, n) and define the singular value decomposition on X as 



X = HAQ, 



(45) 



p(X) 



1 



iTr(XX T E- 1 



(46) 



(2vr)W2det(S)"/2 
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Since XX T = HAA T H T , the joint pdf of (H, A, Q) is 
p(H,A,Q) = 



e -^(HAATE->) J(x ^ H]A)Q)) 



(47) 



(27r)P«/ 2 det(S)™/ 2 

where J (X — > H, A, Q) is the Jacobian converting from X to (H, A, Q). According to Lemma 2.4 of 
[21], 

J(X -» H, A, Q) = 

(48) 

2-"det(A)P-" [J W " A *) 5n,P (H) 5 n,n (Q) , 

j<fc 

where Xj denotes the j-th diagonal element of A and g n:P (H) and g n ,n(Q) are functions of H and Q 
defined in [21]. 

Substituting (48) into (47), p (H, A, Q) can be factorized into functions of (H, A) and Q. Therefore, 
Q is independent of H and A. 

Similarly, one can show that Q is independent of H and A when n > p. ■ 

Lemma 2. Let Q be a matrix defined in Definition 1. Denote q as an arbitrary column vector of Q and 
qj as the j-th element of q. Then 

(49) 



and 



E U'} 

E{qlq 2 3 } 



n(n + 2) 
1 



(50) 



n(n + 2) 

Proof: The proof is different for the cases that n < p and n > p, which are treated separately. 

(1 ) Case n < p: 

In this case, Q is a real Haar matrix and is isotropically distributed [22], [24], [25], i.e., for any unitary 
matrices $ and * which are independent with Q, <I>Q and have the same pdf of Q: 

p(*Q)=p(Q*)=p(Q). (51) 

Following [23] in the complex case, we now use (51) to calculate the fourth order moments of elements 
of Q. Since Q and 



cos v sin u 
- sin cos 9 



Q 
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are also identically distributed, we have 

£{Qn} 

= £{(Qiicos# + Q 2 ismfl) 4 } 

= cos 4 sin 4 0£{Qk} (52) 

+ 6cos 2 6»sin 2 9E {Q^Q^} 

+ 2 cos 3 9 sin 6E { Q? 1 Q 2 i } + 2 cos 9 sin 3 0£ { QnQ^ } 
By taking 9 = — 6 in (52), it is easy to see that 

2cos 3 #sin6>£{Q? 1 Q 2 i} + 2 cos 9 sin 3 9E {Q11Q21} = 0. 
The elements of [Qu] are identically distributed. We thus have E {Qn} = E {Q22}, an d hence 

£{Qn} 

(53) 

= (cos 4 9 + sin 4 9)E{Qj 1 } + 6 cos 2 9 sin 2 9E {QnQ^i} • 

By taking 9 = ir/3, 

E {Q 4 n } = 3E {Q 2 n Q 2 21 } . (54) 
Now we consider E j (£"=1 Qji) 2 }• Since Q T Q = QQ T = X ' E Qji = 1- This implies 

i = X>{QM + EMQ, 2 iQL} 

i=i (55) 
= n£{Q 4 1 }+n(n-l)£{Q 2 1 Q 2 1 }. 
Substituting (54) into (55), we obtain that 

k J n(n + 2) 

and 

£ { Q " Q -i = ^)^ (57) 

It is easy to see that E |g 4 } = E {Qn} and E {<? 2 ^} = E {Q?iQ|i}. Therefore (49) and (50) are 
proved for the case of n < p. 

(2) Case n > p: 
The pdf of q can be obtained by Lemma 2.2 of [21] 

p(q) = CWet (I - qq ^)(- p - 2 )/ 2 /(qf f -< I), (58) 
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where 



Ci = 



vr-P/ 2 r{n/2} 



(59) 



T{{n-p)/2Y 

and /(•) is the indicator function specifying the support of q. Eq. (58) indicates that the elements of q 
are identically distributed. Therefore, E j<7 4 j = E [qf} and £7 jq^j = E{qfq%}. By the definition 
of expectation, 



E{qf} = C 1 [ gfdet(l- q q r ) (n - p - 2)/2 dq, 



and 



Noting that 



and 



we have 



E {qhl} = Ci / ihldet (I - qq r ) {n ^ 2)/2 dq. 

Jqq T ^I 



qq T -< I «4> q T q < 1 



det (I - qq T ) = 1 - q T q, 



E{qf} = C 1 I ^(l-q^-^dq 

-/q T q<l 



i(n-p-2) 



Ci L qf\ 1 - V I dgi . . . dg p 



By changing variable of integration (qi, q 2 , ■ ■ ■ , q p ) to (r, 1; 6> 2 , • • • , p _i) such that 

qi = r cos #i 
^2 = r sin 0i cos 02 
q% = r sin #i sin 02 cos 6*3 

q p -i = r sin^i sin^2 • • • sin p _2 cos p _i 
c/p = r sin^i sin^2 • • • sin p _2 sin^ p _i 

2tt 



we obtain 



/»7T rn />Zn 

E {qf} =C X \ d0 1 d0 2 --- d0 p -2 / d0 p 
Jo Jo Jo Jo 

■f 

Jo 



r 4 cos 4 ^ (l-r 2 y (n - p - 2) 



d(qi, ■■■ ,q P ) 



d(r,e u --- ,0 p _i) 



dr, 



where 



<9 (<?!,••■ ,q P ) 



d{r,9 u --- 



= r^ 1 sin p - 2 0i sin^ 3 2 ■ ■ ■ sin p „ 2 



(60) 



(61) 



(62) 



(63) 



(64) 



(65) 



(66) 
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is the Jacobian associated with the change of variable 
Therefore, 



E {qf} = d • [ cos 4 0i sir/" 2 M#i ■ / sin p - 3 2 d02 
Jo Jo 

• / sin^ 4 3 ^ 3 -- - / smO p - 2 dO p -2 / d0 P -i 

■/ ■/ JO 

• l\^{l-r 2 Y {n - p - 2) dr 
Jo 

vr-P/ 2 r{n/2} 37ri - l)/2} i r{(p-2)/2} 
r{(n-p)/2}' 4 r{(p + 4)/2} '^ 2 r{( P -i)/2} 
i r{(p-3)/2} i r{3/2} i r{i} 

• 7T 2 • • • 7T 2 • 7T 2 • 27T 

T{(p - 2)/2} T{5/2} T{3/2} 

• /V 3 (l-r 2 )" (n - p - 2 W 

JO 

- 3 r W2} /"W 3 ri-r 2 ^ (n - p - 2) dr 

"2r{(n-p)/2}r{p/2 + 2}y ^ ' 

_ 3 r>/2} i r{(n-p)/2}r{p/2 + 2} 

" 2 T{(n - p)/2}r{p/2 + 2} ' 2 T{n/2 + 2} 

3r{n/2} 

~ 4r{n/2 + 2} 

3 

~ n(n + 2)' 



(67) 
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Similarly, 



/ p \ 5("-P- 2 ) 

J E ql<l V *=i / 

/*7T /*7T /"7T /»27T 

= Ci / 6Bx\ d0 2 --- dO p - 2 / cW p _i 
Jo Jo Jo Jo 

■ / r 2 cos 2 9\r 2 sin 2 ^cos 2 62(1 — r 2 ) ; 
Jo 



d(qi,--- ,q P ) 



dr 



d(r,e 1: --- ,e p -i} 

= Ci ■ / cos 2 0i sin p • / cos 2 fl 2 sin p " 3 2 d0 2 
Jo j 

P7Y f'TT PIT 

■ sm p - 4 6 3 d6 3 - / sin p - 5 M^4--- / sin0 p _ 2 d0 p _ 2 

JO JO JO 

Jo Jo 

■K-P/ 2 T{n/2} tt3 l)/2} vrir{(p-2)/2} 

" r{(n- P )/2} ' T r{ P /2 + 2} ' ir{(p + i)/2} 
i r{(p-3)/2> i r{(p-4)/2} i r{i} 



r{( P -2)/2} r{( P -3)/2} r{3/2} 

i r{(n-p)/2}r{p/2 + 2} 
2 T{n/2 + 2} 

1 



(68) 



n(n + 2) 

Therefore, (49) and (50) are proved for the case when n > p. This completes the proof of Lemma 2. 



Lemma 3. Let S be the sample covariance of a set of p-dimensional vectors {xj}™ =1 . If {xj}™ =1 are 
i.i.d. Gaussian vectors with covariance 



E 



{ll x *ll 2 S} 



n 



n + 2 



2Tr(S 2 ) + Tr 2 (S) 



Proof: For simplicity, we work with the scaled covariance matrix M defined as 

n 

M = ^x iX f = nS, 

i=i 

and calculate E j 1 1 1 1 2 1 M j instead of E j 1 1 1 1 2 1 s|. We are then going to prove that 

E { IM2 I M l = ^TY) (m (M2) + ^ (M)) • 



(69) 



(70) 



n (n + 2) 

We use Lemma 1 and Lemma 2 to establish (71) 



(71) 
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Let X and (H, A, Q) be matrices denned in Definition 1. Let q be the 2-th column of Q denned in 
Definition 1. Then 

x, = HAq. (72) 



Let 



Then 



and 



Therefore we have 



D = A . 



M = XX T = HA 2 H T = HDH T , 



xf x; = q T A T H T HAq = q T Dq. 



^{Hx.I^m} =£{(q r Dq) 2 |M}. 



(73) 



(74) 



(75) 



(76) 



According to Lemma 1, Q is independent of H and A. Since q is a function of Q, M and D are 
functions of H and A, q is independent of M and D. 
From the law of total expectation, 



(q T Dq) 2 |lVl} =^{^{(q T Dq) 2 M,d}|m}. 



(77) 



Expand q T Dq as 

n 

q r Dq = ^g 2 , (78) 
J'=l 

where dj is the j-th diagonal element of D. Since q is independent of M and D, according to Lemma 

2, 

£{(q T Dq) 2 |M,D} 



j=i 



M. D 



j=l j^k 



(79) 



1 



n (n + 2) 



n (n + 2) 



j=l j^k 

(2Tr (D 2 ) + Tr 2 (D)) . 
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Since Tr (D) = Tr (M) and TV (D 2 ) = Tr (M 2 ), substituting (79) into (77), we have 

£{(q T Dq) 2 |lVl} 



= E 
= E 



1 



n (n + 2 
1 



(2TV (D 2 ) + Tr 2 (D)) 



— (2Tr (M 2 ) + Tr 2 (M)) 

n(n + 2) v v ; v n 



M 
M 



= — (2Tr (M 2 ) + Tr 2 (M)) 

n(n + 2) v v ; v " 

Lemma 3 now allows us to prove Theorem 2. 



A. Proof of Theorem 2 
Proof: 



=E {(1 - p LW )S + p lw f\s} 

= (i-e{p lw \s}}s + e{p lw f\s} 

Therefore we obtain the shrinkage coefficient of T^rblw'- 

PRBLW =E | pL\v \ S| 



_ i=i i 


| Xjxf - S 


2 
F 




n 2 


Tr (§ 2 ) - Tr 2 (s) /p 



Note that 



i=i 

= ^^{l|x,|l2|s}-nTr(S 2 ). 



From Lemma 3, we have 



XiX,- S 



n(n — 2) 



Tr (S 2 ) + -^-Tr 2 



n + 2 V J n+2 
Equation (17) is then obtained by substituting (84) into (82). 



(80) 



(81) 



(82) 



(83) 



(84) 
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